Machine Learning in Public Health

Lecture 1: What is Machine Learning?

Dr. Yang Feng

Today’s agenda

About the instructor

To learn more about me

Brilliant Course Assistants

Yuyu (Ruby) Chen is a doctoral student at NYU School of Global Public Health specializing in Biostatistics. She was involved in multiple collaborative projects and consulting tasks, including Bayesian Adaptive Platform clinical trials, meta-analysis, machine learning and several longitudinal/cohort studies during her time at NYU. She is interested in using data and novel methods to address public health issues and find optimal clinical solutions with statistical approaches.

Brilliant Course Assistants

Yu Meng a second-year student from MS Biostatistics.

I’m currently working on the association between health care coverage with several physical health risk factors. I love traveling. And I love snow days~ Hope we can have a great semester!

Brilliant Course Assistants

Yuan Zhao is a third year PhD student in Epidemiology, her research mainly focused on causal inference using targeted maximum likelihood estimation (TMLE) framework and machine learning to predict hospital admission. She’s especially interested in applying social determinants to improve algorithmic fairness/interpretability using health care data.

Brilliant Course Assistants

Jianan (Zoe) Zhu is currently a second-year student of Biostatistics in MS at GPH.

My research interest is machine learning with applications to public health. In my spare time, I like to enjoy various cuisines in NYC.

Let’s start from “Statistics”

21st century: big data!

Machine learning

Definitions from CS community

Structured vs. unstructured data

Machine learning examples

Supervised learning paradigm

Supervised Learning (Training)

Supervised Learning (Prediction on Test Data)

First supervised learning example: diamond price prediction

Classification or Regression?

Second example: cancer diagnosis (benign, malignant)

Classification or Regression?

Last example: AI vs. doctors

Unsupervised Learning

Supervised vs. Unsupervised Learning

Sullabus highlights (cont)

What this course is about?

Achievements after taking this course

Structure of the Course

My expectations

Motivating Example: Predicting Income

Predicting Income using Years of Education.

A General Regression Formulation

Example: Predicting Income (II)

Why Estimate \(f\)?

Goal No. 1: Prediction

Why Estimate \(f\)?

Goal No. 2: Inference

How Do We Estimate \(f\)?

\[income \approx \beta_0 + \beta_1 \times education + \beta_2 \times seniority\]

How Do We Estimate \(f\)?

thin-plate spline: can be smooth or rough.

Flexibility vs. Interpretability

No Free Lunch Theorem

Measuring Quality of Fit

\[MSE_{training} = \frac{1}{n}\sum_{i=1}^n (Y_i - \hat Y_i)^2,\]

MSE vs. Flexibity

Decompose the test MSE

For a test observation \(x_0\), we want to minimize the expected test MSE.

\[\begin{align} E(y_0 - \hat f(x_0))^2 &= Var(\hat f(x_0)) + [Bias(\hat f(x_0))]^2 + Var(\epsilon) \end{align}\]

Bias and Variance Tradeoff